AITopics | dot-product attention

Collaborating Authors

dot-product attention

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Phase Transition between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention

Neural Information Processing SystemsFeb-11-2026, 19:18:11 GMT

Such investigations have demonstrated that attention layers are able to implement a wide range of different algorithms, even for the same task, using both positional and semantic attributes of the inputs.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > Switzerland > Vaud > Lausanne (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry: Government > Regional Government > North America Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

bc968adbdff4a2551649d464b83f264a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 15:48:13 GMT

estimator, fourierformer, transformer, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(8 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

72ab54f9b8c11fae5b923d7f854ef06a-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 22:13:34 GMT

baseline, memory unit, sarnet, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.35)

Add feedback

Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems

Neural Information Processing SystemsDec-23-2025, 22:33:56 GMT

The Transformer and its variants have been proven to be efficient sequence learners in many different domains. Despite their staggering success, a critical issue has been the enormous number of parameters that must be trained (ranging from $10^7$ to $10^{11}$) along with the quadratic complexity of dot-product attention. In this work, we investigate the problem of approximating the two central components of the Transformer --- multi-head self-attention and point-wise feed-forward transformation, with reduced parameter space and computational complexity. We build upon recent developments in analyzing deep neural networks as numerical solvers of ordinary differential equations. Taking advantage of an analogy between Transformer stages and the evolution of a dynamical system of multiple interacting particles, we formulate a temporal evolution scheme, \name, to bypass costly dot-product attention over multiple stacked layers. We perform exhaustive experiments with \name\ on well-known encoder-decoder as well as encoder-only tasks. We observe that the degree of approximation (or inversely, the degree of parameter reduction) has different effects on the performance, depending on the task. While in the encoder-decoder regime, \name\ delivers performances comparable to the original Transformer, in encoder-only tasks it consistently outperforms Transformer along with several subsequent variants.

multi-particle dynamical system, redesigning, transformer architecture, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

A Phase Transition between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention

Neural Information Processing SystemsOct-10-2025, 00:21:36 GMT

equation, matrix, mechanism, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > Switzerland > Vaud > Lausanne (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry: Government > Regional Government > North America Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

) and

Neural Information Processing SystemsOct-3-2025, 05:51:34 GMT

We would like to thank all of the reviewers for their time and thoughtful comments on our paper. MAAC's use of critic-attention only to reduce state-space representation, not To substantiate this claim, we performed an analysis of TarMAC's More importantly, SARNet's use of a dedicated memory unit and the ability's suggestion we will add results from We have described it in Appendix A.1.4 and we will add further details by including However, we did not see performance gains for the tasks in the paper. We will note results with gates in the revision. However, SARNet's performance is substantially better than baselines when the task becomes harder (more agents) and

artificial intelligence, machine learning, sarnet, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.35)

Add feedback

d0921d442ee91b896ad95059d13df618-Supplemental.pdf

Neural Information Processing SystemsSep-26-2025, 05:57:46 GMT

artificial intelligence, machine learning, operator, (18 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report (0.92)

Industry:

Health & Medicine (0.93)
Energy > Oil & Gas (0.67)
Law > Environmental Law (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Stochastic Clock Attention for Aligning Continuous and Ordered Sequences

Soh, Hyungjoon, Jo, Junghyo

arXiv.org Artificial IntelligenceSep-19-2025

We formulate an attention mechanism for continuous and ordered sequences that explicitly functions as an alignment model, which serves as the core of many sequence-to-sequence tasks. Standard scaled dot-product attention relies on positional encodings and masks but does not enforce continuity or monotonicity, which are crucial for frame-synchronous targets. We propose learned nonnegative \emph{clocks} to source and target and model attention as the meeting probability of these clocks; a path-integral derivation yields a closed-form, Gaussian-like scoring rule with an intrinsic bias toward causal, smooth, near-diagonal alignments, without external positional regularizers. The framework supports two complementary regimes: normalized clocks for parallel decoding when a global length is available, and unnormalized clocks for autoregressive decoding -- both nearly-parameter-free, drop-in replacements. In a Transformer text-to-speech testbed, this construction produces more stable alignments and improved robustness to global time-scaling while matching or improving accuracy over scaled dot-product baselines. We hypothesize applicability to other continuous targets, including video and temporal signal modeling.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.14678

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

FourierFormer: Transformer Meets Generalized Fourier Integral Theorem T an M. Nguyen

Neural Information Processing SystemsAug-18-2025, 08:57:13 GMT

Multi-head attention empowers the recent success of transformers, the state-of-the-art models that have achieved remarkable success in sequence modeling and beyond.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(8 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Choose a Transformer: Fourier or Galerkin

Neural Information Processing SystemsAug-17-2025, 12:22:32 GMT

Scientists and engineers have been working on approximating the governing PDEs of these physical systems for centuries. The emergence of the computer-aided simulation facilitates a cost-friendly way to study these challenging problems.

artificial intelligence, machine learning, operator, (18 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report (0.92)

Industry: